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We propose a new test statistic based on a score process for determining the statistical significance of a 
putative signal that may be a small perturbation to a noisy experimental background. We derive the reference 
distribution for this score test statistic; it has an elegant geometrical interpretation as well as broad applicability. 
We illustrate the technique in the context of a model problem from high-energy particle physics. Monte Carlo 
experimental results confirm that the score test results in a significantly improved rate of signal detection. 
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One of the fundamental problems in the analysis of 
experimental data is determining the statistical sig- 
nificance of a putative signal. Such a problem can be 
cast in terms of classical "hypothesis testing", where 
a null hypothesis Ho describes the background and 
an alternative hypothesis Hi characterizes the signal 
together with the background. A test statistic (a func- 
tion of the data) is used to decide whether to reject 
Ho and conclude that a signal is present. 

The hypothesis test concludes that a signal is present 
whenever the test statistic falls in a critical region 
W. One is interested in the probability that a sig- 
nal is found under two scenarios. First, when the null 
hypothesis Ho is true, the significance level a is the 
probability of incorrectly concluding that a signal is 
present. Second, when the alternative Hi is true, the 
power of the test is the probability that the signal is 
found. The goal is to construct a test statistic whose 
asymptotic distribution (reference distribution under 
Ho for large sample size) can be calibrated accurately 
and that the associated test has high power at a fixed 
significance level, such as a = 0.01. 

When the two hypotheses are distinct, a powerful 
technique based on the likelihood ratio test (LRT) is 
often used. Suppose p(x; 0) is a probability density 
function for a measurement x with a parameter vector 
£ Q C lZ d . The joint probability density function 
evaluated with n measurements X for an unknown 
is the likelihood function Q L(0|X). An effec- 
tive approach to the problem of choosing between Ho 
[corresponding likelihood L(#o|X)] and Hi [with 
a likelihood L(<?i|X)] for explaining the data is to 
consider the LRT statistic: A = L(0 o |X)/L(0i |X), 
where is the value of that maximizes L(0|X) 
f3j . To employ the LRT, the parsimonious model un- 
der Ho (with sq parameters) must be nested within 



the more complicated alternative model under Hi 
(with si parameters). For simple models, under reg- 
ularity conditions, 2 log(A) is distributed as the x 2 
distribution with (si — so) degrees of freedom under 

Ho m 

When the alternative hypothesis corresponds to a sig- 
nal which is a perturbation of the background, reg- 
ularity conditions required for this asymptotic the- 
ory are violated, since (a) some of the parameters 
under Ho are on the boundaries of their region of 
support and (b) different parameter values give rise 
to the same null model. As a result, the LRT has 
lacked an analytically tractable reference distribution 
required to calibrate a test statistic. Such a difficulty 
occurs in many practical applications, for example, 
when testing for a new particle resonance of unknown 
production cross section as the signal strength must 
be nonnegative. Hence, the LRT must be employed 
cautiously; however, it has been employed in several 
problems of practical importance where certain re- 
quired regularity conditions are violated 0. An in- 
appropriate application of the LRT statistics can lead 
to incorrect scientific conclusions 14151 . 

In light of the above difficulties with the LRT, a x 2 
goodness-of-fit test is commonly employed. How- 
ever, it typically has less power than might be hoped 
for as it does not take into account information about 
the anticipated form of the signal. We propose a new 
test statistic based on a score process to detect the 
presence of a signal and present its reference distri- 
bution. This score statistic is closely related to the 
LRT for sufficiently large sample size. 

Consider the model 

p(x; r\, 0) = (1 - rj) f(x) + j) ip(x; 0), 
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where f(x) is a specified null density and ip(x,0) 
is a perturbation density. The parameter vector 
is the "location" of the perturbation, and 77 G [0, 1] 
measures the "strength of the perturbation". The null 
hypothesis of no signal {Ho : r\ = 0) implies that 
p(x; 0, 0) = f(x) for all x independently of 0; hence 
we are in the scenario (b). In searching for a new par- 
ticle resonance, for example, one measures the fre- 
quency of events as a function of energy E, model- 
ing it by p(E; r], E ), where f(E) characterizes the 
background density and i;(E;E ) = [T / (2 n)][(E - 
E ) 2 + (r/2) 2 ]- 1 is the Cauchy (Breit-Wigner) den- 
sity describing a resonance centered on Eq with full 
width at half-maximum T. In this scenario, 77 = 
under Ho and hence the asymptotic distribution of 
2 log(A) under Ho does not have an asymptotic \ 2 
distribution. The asymptotic reference distribution is 
not analytically tractable, and hence it is not possi- 
ble to employ its measured value for valid statistical 
inference. 

A key obstacle to detecting the signal is finding the 
tail probability. We provide an asymptotic solution 
to this problem via a geometric formula (see Eq. 0)- 
The relative improvement of the score test over the 
\ 2 goodness-of-fit test is particularly salient when the 
signal is hard to detect (see Fig.|4}. The development 
of the reference distribution and a flexible computa- 
tional method will enable making probabilistic state- 
ments to solving some of the fundamental problems 
arising in many experimental physics. 

Pilla and Loader |6) have developed a general theory 
and a computationally flexible method to determine 
the asymptotic reference distribution of a test statis- 
tic under Ho- Their method is based on the "score 
process", indexed by the parameter vector and de- 
fined as S(0) := 8 log^U p{E t ;r), 0)]/^]^ for 
a given data E = (E\, . . . ,E n ). Under Ho, the ex- 
pectation of S(0) is for all 0, while under Hi it has 
a peak at the true value of 0. Hence, the statistic S(0) 
is sensitive to the signal of interest. The random vari- 
ability of 8(0) can exhibit significant dependence on 
the parameter vector 0, hence we consider the nor- 
malized score process defined as 

S *(0) : = - S(g) , (1) 
Vn<7(M) 

where n is the total number of events observed, and 

, _ m 
J /O) 

is the covariance function of S(0) for e 9 C lZ d . 



For exposition, we assume that f(E), the density un- 
der Hq, is completely specified. In practice, it of- 
ten contains unknown parameters. In this scenario, 
the covariance function C(0, 0*) in Eq. ]2) for S(0) 
needs modification. Pilla & Loader derive an ap- 
propriate C(0, 0') under estimated parameters. 

For testing the hypotheses Ho ■ r) — (no signal) 
versus Hi : r; > (signal is present) consider the 
test statistic T := sup# S*(0) for e 6 C K d . 
It is concluded that a signal is present if T exceeds a 
critical level c € 1Z. The problem now is to determine 
the reference distribution of T, so that c can be chosen 
to achieve a specified significance level a. 

Under Ho, S*(0) converges in distribution to a Gaus- 
sian process Z(0) with mean and covariance func- 
tion C(0, + ) I J C{0, 0)C(9\ f ) as n -> 00 g). 
The reference distribution of T converges to that of 
sup^ Z(0) as n — > 00 for S G C 7Z d . Except 
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FIG. 1: (color) Tube around a one-dimensional manifold 
with boundaries, embedded in S 2 C 1Z A . 

in special cases, this distribution cannot be expressed 
analytically. However, a good asymptotic solution to 
the tail probability P(sup^ Z{0) > c), where c £ 1Z 
is large, can be obtained via the volume- of -tube for- 
mula 0J9)- The volume-of-tube formula provides an 
elegant geometric approach for solving problems in 
simultaneous inference 1101 by reducing the evalua- 
tion of tail probabilities to that of finding the (J — 1)- 
dimensional volume of the set of points lying within 
a distance r of the curve (d = 1) or manifold (d > 2) 
on the surface of the unit sphere in J-dimensions for 
some integer J (see Fig.^. 

Suppose £(0) defines a manifold for on the sur- 
face of a (J — 1) -dimensional unit sphere S^ 1 ^. 
Fig. ^ shows a "tube" of radius r around a mani- 
fold £(0) embedded in S^" 1 ) c K J with boundary 
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caps. We represent the Gaussian random field Z(0), 
via the Karhunen-Loeve expansion II II as Z(0) = 
J2T=i$k 6(0) = (0,£(0)>, where (•,•) denotes the 
inner product, i9 and £ are vectors and i)k ~ N(Q, 1). 
If the Karhunen-Loeve expansion is terminated after 
J terms, then the following relation between the man- 
ifold £(0) embedded in S^ -1 ) c K J and the Gaus- 
sian random field Z(6) holds @: 



P sup Z(ff) > c 
\8e& I 



[ P\ sup (L/,£(0)) > w) 
J c 2 \0ee / 



My) 



where [/ 



(c/x = 



is uniformly distributed on 6>( J 



c 



'HI) 
€ = 



•••,£/)> «> = c/^/y, and ftj(y) is a x den- 
sity with J degrees of freedom. The uniformity prop- 
erty enables finding the P(-) in the integrand via the 
volume-of-tube formula. Note that r 2 = 2(1 — w). 

Geometrically, P(sup^ (U,£(0)) > u>) is the 
probability that U lies within a tube of radius r 
around £(0) on the surface of S^" 7-1 ) and equals the 
volume of tube around £(0) divided by the surface 
area of S^' 1 ^ 1 ^ QIH). In effect, constructing a test 
of significance level 5% is equivalent to choosing the 
rejection set covering 5% of S^' 1 ^ 1 ^ . Therefore, find- 
ing critical values of the test statistic T is equivalent 
to finding a (J — l)-dimensional volume of the tube. 



0ee 



;c(0,0)]- 



D(0,6)dO, 



where D(9, 0) is defined as 



det 



C(0,0 f ) ViC(0,0 t ) 
V 2 (2(0,0*) ViV 2 C(0,0 t ) 



f =0 



with Vi and V 2 as the partial derivative operators 
with respect to and 0^ respectively. The expres- 
sion for Ci is similar except that integration is over the 
boundary of the manifold. The remaining constants 
involve curvature of the manifold and its boundaries, 
and become progressively more complex. However, 
for practical problems the first few terms will suffice 
and an implementation of the first four terms is de- 
scribed in 1121 . When the reference distribution can 
be approximated by a x 2 distribution, then a tabulated 
value can be employed to calibrate the test statis- 
tic whereas the geometric constants appearing in the 
above tail probability evaluation depend on the prob- 
lem at hand. In this modern computer era, it is not 
difficult to compute them numerically 1121 . 

In many applications, including the one considered 
in this letter, one is interested in the probabilities of 
rare events (i.e., c — > oo). In this case, the terms in 
Eq. (|3} are of descending size, and the error term is 
asymptotically negligible. 



The results of Hotelling-Weyl-Naiman |7H9| imply 
that for w w 1, the tail probability is expressible as a 
weighted sum of \ 2 distributions, with [d + 1) terms 
and coefficients that depend on the geometry of the d- 
dimensional manifold £(0). The results of Pilla and 
Loader ]6) provide an expansion of the distribution 
of supg Z(0) in terms of the \ 2 probabilities: 



P sup Z{6) > c 
V0ee j 



E 

fe=0 

+ o(c 



AkAd+i-k 

-1/2 -e 2 /2 



-P(xl+i- k > 



) as c — » oo , 



(3) 



where A = 1 and A k = 2 7r fc/2 /r(fc/2) for k > 1. 
The constants £rj > • • • > C<2 depend on the geometry of 
the £(0); Co is the area of the manifold and £i is the 
length of the boundary of the manifold. These can 
be represented explicitly in terms of the covariance 
function: 



See separate file for Figure 2. 



FIG. 2: (color) Surface of the process S* (0) as a function 

of0 = (E ,r). 



We demonstrate the power of the score test with 
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a Monte Carlo simulation experiment drawn from 
high-energy physics. In our simulation, we consider 
measurements of energy in a region E E [0, 2] in 
which the background (null) density is modeled as 
linear, with a specific form f(E) = (1/2.6) (1 + 
0.3E). The resonance is modeled by a Breit-Wigner 
density function. The parameters for this problem are 
modeled following an example in Roe 1131 . 

To examine the effectiveness of the test T in detecting 
a signal, we perform Monte Carlo analyses of 10,000 
samples each with a size of n — 1000 events spread 
over 50 bins at the values of Y = 0.2 and Eq = 1. 
For a single simulated dataset, Fig. |2] shows the nor- 
malized score surface as a function of Eq and Y. It is 
clear that the maximum is achieved at Eq = 1 irre- 
spective of the value of Y. 
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Maximum of normalized score process 

FIG. 3: (color) Histograms of the simulated null (77 = 0) 
density (red) and alternative (77 = 0.1) density (yellow) 
of the test statistic T with a superimposed (blue) asymp- 
totic null density (derivative of Eq. (3)) for a fixed I\ The 
purple vertical bar is the cut off for the test statistic T at 
the 5% false positive rate calculated via the volume-of-tube 
formula (Eq. (5) with d — 1). 

Fig.|5]shows histograms over 10,000 samples under 
the H : 77 = and H\:t\ = 0.1 for a fixed V. The 
former histogram confirms that about 5% of the time, 
hypothesis of no signal be rejected. The asymptotic 
null density (derivative of Eq. (3) with d = 1) agrees 
with the simulated null distribution as expected. 

When both Eq and Y are estimated, Fig.|4]shows that 
the power of detection increases as the signal strength 



7/ increases. Our test statistic T is significantly more 
powerful than the x 2 goodness-of-fit test in detect- 
ing the signal. The asymptotic tail probability result 
obtained via the volume-of-tube formula (Eq. Q) 
is elegant, simple and powerful in distinguishing the 
signal and the random fluctuations in data. 
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FIG. 4: (color) Power comparison of the \ 2 goodness-of- 
fit test (blue) and normalized score test T (red) for d = 2 
at q = 0.05 (dashed) and a = 0.01 (solid), calculated via 
the volume-of-tube formula, based on 10,000 simulations 
for binned data. 
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